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Abstract 

In the context of the analysis of measured data, one is often faced with the task to 
differentiate data numerically. Typically, this occurs when measured data are con- 
cerned or data are evaluated numerically during the evolution of partial or ordinary 
differential equations. Usually, one does not take care for accuracy of the resulting 
estimates of derivatives because modern computers are assumed to be accurate to 
many digits. But measurements yield intrinsic errors, which are often much less 
accurate than the limit of the machine used, and there exists the effect of "loss 
of significance" , well known in numerical mathematics and computational physics. 
The problem occurs primarily in numerical subtraction, and clearly, the estimation 
of derivatives involves the approximation of differences. In this article, we discuss 
several techniques for the estimation of derivatives. As a novel aspect, we divide 
into local and global methods, and explain the respective shortcomings. We have 
developed a general scheme for global methods, and illustrate our ideas by spline 
smoothing and spectral smoothing. The results from these less known techniques 
are confronted with the ones from local methods. As typical for the latter, we chose 
Savitzky-Golay filtering and finite differences. Two basic quantities are used for 
characterization of results: The variance of the difference of the true derivative and 
its estimate, and as important new characteristic, the smoothness of the estimate. 
We apply the different techniques to numerically produced data and demonstrate 
the application to data from an aeroacoustic experiment. As a result, we find that 
global methods are generally preferable if a smooth process is considered. For rough 
estimates local methods work acceptably well. 
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1 Introduction 



The estimation of derivatives from numerical data is a classical problem which 
occurs in many problems of data analysis [1]. Applications range from biol- 
ogy [2,3], and chemistry [4] to a variety of problems in physical applications 
[5,6,7]; various mathematical aspects are discussed in [8,9,10,11]. Surprisingly, 
experimenters and numerical data analysts often use very crude techniques 
for the estimation of derivatives and the topic is not discussed in depth in the 
physics community Despite the relatively simple strategies to enhance results 
for numerical differentiation, improved techniques are rarely used. With this 
contribution, we formulate a general scheme to distinguish the existing meth- 
ods and discuss in detail their differences. Our emphasis is on accuracy and 
smoothness, so wherever one is not bothered with very accurate estimation or 
with very noisy data, our text is understood to be informative. However, in 
critical applications with a high standard on accuracy of the data, this paper 
can guide along the numerical subtleties of the estimation of derivatives with 
a minimum loss of accuracy. The task we try to analyze is to differentiate 
numerically a data series with given accuracy in such a way that the loss of 
accuracy is minimal and the smoothness of the result, defined later in this text, 
is maximal. In the following, we present typical techniques and algorithms in a 
computation-oriented way, referring to the literature for mathematical details. 

Whereas mathematically, the derivative of a function is obtained by a limit 
process involving infinitesimal calculus, this can never be realized for data 
measured by digital equipment due to the intrinsic discretization of the data. 
In addition, real measurements yield data with noise, either due to the device 
properties or due to finite resolution, e.g., when sampling the data. An estimate 
for a derivative of some data y(x) with respect to the argument x has to cope 
with these two restrictions. Noise and discretization effects yield errors in the 
data y, the finite sampling interval does not allow the limit Ax — > and thus 
produces numerical errors, too. 

Typically, to obtain an estimate for the derivative y'(x), one tries to approxi- 
mate the measured data y best (in an exactly specified sense). One then hopes 
that the derivative found from this approximation is as well best. In this ar- 
ticle, we want to obtain maximal accuracy, but as well the smoothest result 
possible. One can divide the existing techniques into local and global strate- 
gies. For the first, one fits or interpolates a function through the data points 
locally, as for a running window; in the second case, the fit or interpolation is 
global. A typical example for global fits are smoothing splines as an enhance- 
ment of the well-known interpolating splines [12]. In contrast to interpolation 
a spline is put through many data fulfilling a least-squares condition. This 
way, one can take into account a natural property of physical variables - their 
smoothness, which is very often assumed for theoretical or experimental rea- 
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sons. Taken into account this criterion, differences between methods can be 
clearly noticed. The quantification of the smoothness property is novel to the 
numerical differentiation, we believe that it is an important and useful crite- 
rion. Another quantification of the quality of the estimate than smoothness is 
the least-squares error. A consistent formulation including a smoothness pa- 
rameter yields a minimization problem, mathematically known as Tikhonov 
regularization [11,13]. 

Sometimes, a kernel smoother is applied before fitting the data. This is a quite 
crude technique and we can not recommend this method, since two regression 
methods are combined and one carries the disadvantages of both methods. 
Furthermore, most methods work better with the original data [14]. But we 
must also note, that some combinations of kernel smoothers and regression 
methods might be successfully applied to some special problems. 

In this article, we compare four different techniques: finite differencing [15], 
the Savitzky-Golay filter [14], smoothing splines [12] and a spectral technique 
[5,14]. Whereas the first two are local methods, the latter are global ones. 
Local methods typically produce non smooth functions, whereas for global 
methods the smoothness is controlled in a well-defined way. We compare the 
methods by two numerically produced data sets where the noise in the data 
can be controlled and the derivative is known either analytically or numerically 
with high accuracy. Finally, we apply the methods to a set of data from an 
aeroacoustic measurement [16]. 

This article is organized as follows: In Section 2, we give a brief overview of the 
topic and details on local and global methods. Results for the implementation 
of the methods are given in Section 3 for the three above mentioned examples. 
The article concludes with Section 4. 



2 Estimation of Derivatives 

The estimation of derivatives is frequently encountered in the numerical in- 
tegration of ordinary differential equations [14, Chp. 5,16]. A function f(x) 
is given and one has to compute its derivative. For this task, the typical es- 
timators like finite difference schemes, spectral or kernel methods are used 
[15,14,17] in dependence on the required accuracy and speed of the numerics. 
The key problem concerning accuracy is the cancellation of digits with numer- 
ical subtraction, mentioned above. An introductory analysis is given in [14, 
Chp. 5], a more detailed exposure is found in [14,18,19]. 

A conceptually different problem arises if one wants to estimate derivatives 
from measured data without knowing the underlying function. Then, one needs 
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first a "good" estimate for f(x) and then another "good" method to extract 
the derivative. What "good" means can vary heavily from data set to data 
set. The algorithms and methods one uses to solve this problem depend on the 
requirements the analyst imposes on the data. E.g., it might be unimportant 
for an experimenter if his result is accurate up to 2 or 10 digits because he 
only needs a graphical representation of the results. On the other hand, some 
applications, like differential embedding [20], might require the knowledge of 
several derivatives at high accuracy for further analysis of a dynamical system. 
Then, usual techniques, as given in [14] very quickly meet their limits regarding 
accuracy and smoothness of a numerically calculated derivative. 

In the mathematical literature, the problem has been analyzed as inverse prob- 
lem under numerical aspects [11,14,21]. Here, we follow the rigorous ideas of 
[11] and present a classification of the schemes into global and local ones; we 
illustrate the differences by typical examples. Our focus is on the smoothness 
of the estimated derivatives, because a description of a physical system typi- 
cally requires functions and derivatives to be smooth up to a certain order. For 
example, most Hamiltonian systems are smooth systems, since their solutions 
are mostly required to be C 2 . 

In the following, we consider a data series y(x), measured at N points (yi,Xi) 
(i = 1, N). Due to measurement accuracy and noise sources the data have 
errors and we assume that our measured values consist of a deterministic part 
f(x) and a noisy part rj(x), such that the measured data are y(x) = f(x) + 
7)(x). This already indicates that the true function f(x) should be found using 
regression techniques. Since / is generically nonlinear, linear regression [14] is 
not applicable and nonparametric methods should be used; for an introduction 
see [22]. Please note that this is a much more complex problem than the 
estimation of parameters ctj, % = l,...,n by iterative methods if the form 
of f(x, ai, ■ ■ ■ , &n) is known, as, e.g., the standard Levenberg-Marquardt one 
[14,23,24]. We assume further that / is a smooth function and that the data 
series yi(xi) is given on a uniform grid Xi = a + iAx, i = 0, 1, 2, . . . of length 
L = b — a in the interval a < X{ < b. Now, we are interested in the estimate 
for the derivatives f'(x) = df/dx from the measured data. Throughout this 
article we will denote the estimate of the derivative by f'(x) and accordingly 
the estimate of the function by f(x). 

After having experimented with many data sets and methods, we found out 
that the techniques commonly used can be distinguished in a simple way 
as local or global approximation methods. A local method approximates the 
function f(x) in some neighborhood U(x) of x, and does not yield information 
beyond. This means especially, that there can be jumps in the approximated 
function between different x values. A global method includes information 
about all points in the estimate for f(x) and thus jumps can be avoided if 
smooth functions are used for the approximation. 
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The basic idea in all the methods is to approximate the function f(x) with the 
hope that then the derivative can be estimated. More explicitly one assumes 
the following: if the approximation / is close to the original function /, then 
the derivative /' is also close to /'. In mathematical terms, if ||/ — /|| = min, 
then ||/' — /'|| ~ min, with || • || a suitable norm. The estimation is best in 
the sense of the applied norm. Usually, the L2 norm is chosen and one has to 
solve a least-squares problem [14,23,24]. Common methods yield functions / 
obtained either by local or global interpolation, or a fit [14] . In an interpolation 
problem the resulting function is required to go through the data points, i.e. 
f(xi) = z/i] a fit produces the function such that J2j \\f( x j) ~ Uj\\ — rain. 
Implicitly, that means that f(xi) is not necessarily identical to yi. A local 
method acts on a subset of the data (j G (i — ni, % + n r )), with n r , njGiVa 
global one on all available data. Most of the methods can be extended to give 
approximate function values or derivatives not only at the points Xi but on the 
whole interval a < x < b. At the boundaries some methods are problematic 
because, e.g., the statistic changes, or the problem is no longer well posed. 



2. 1 Local Methods 



Local methods work by fitting or interpolating for each Xj a function f(x) 
on some sub-interval / 3 x of the domain. Obviously, this does not guaran- 
tee that the function is smooth, because the measurement errors can yield 
jumps (from interval to interval). Probably the best known method is given 
by finite differences; it results in a natural way when differential operators 
are discretized. This technique has turned to a huge field in connection with 
numerical integration of partial differential equations [15]. Depending on the 
problem, different schemes can be used and the choice of the right difference 
scheme can have enormous impact on the result of an integration [25]. The 
basic idea is simple: the function / is interpolated by a polynomial of order m. 
For symmetric differencing, it is put through the points Xi-k, ■ ■ ■ , Xi, . . . , Xi + k, 
where m = 2k is a positive integer, asymmetric schemes, like the well-known 
upwind schemes work in a similar way. The derivative at the point x is then the 
derivative of the interpolated polynomial. Finite differencing is local, because 
the approximation of the derivative depends only on the 2k + 1 points in the 
neighborhood of X{. The polynomial can be obtained from Taylor expansion, 
Pade approximation or similar schemes [25]. The finite difference estimator 
for a Taylor expansion has the form [15] 



with coefficients 




For other schemes, different coefficients are used. In that case (2) or the right 
hand side of (1) can contain derivatives. It is also possible to vary the step 
width Ax. To do so, one replaces Ax by 5x = I Ax (I e IV) and j by / = Ij 
in (1). 

If the data are noise-free, the error e = \\f — f'\\ in (1) is of order 0(Ax m ). 
This means, with a fine sampling one can arrive at accurate approximations. 
Numerically, however, one is faced with the unavoidable problem of accuracy 
loss in numeric subtraction due to the loss of significance of digits [18,19]. 
Subtraction of two floating point numbers is ill conditioned if the value of two 
numbers is approximately equal. This is typical for the scheme (1). Especially 
if measurement noise is present, it can easily travel to the leading digits and 
render the results meaningless; i.e. finite differencing is a bad choice for nu- 
merical differentiation. One can consider the problem as a trade-off between 
numerical inaccuracy due to subtraction and analytical need for small values 
of Ax for the approximation (1) to hold sufficiently well. 



A consequent error analysis including the data accuracy 5 = \JVAR(rj), with 
r) the measurement noise process, yields the error of a first-order symmetric 
finite difference scheme (k — 1) in Eq. (1) [11]: 

e ~ 5x + 5/5x . (3) 

A minimal error e ~ 2\/~5 is found for 5x ~ \f~5~. With given data of sampling 
interval Ax <C y/5 one needs to discard some points to achieve the minimum 
and uses 5x = I Ax This is not satisfying, because the information contained 
in the left-out points in the interval .., i/i+i is thrown away. One would 
like to use a scheme which has minimal error, but uses all points with the 
intention to go beyond the bound 25 for the error of the estimate. 

This can be achieved by using a fit of a polynomial of order m < 2k through 
all data in the interval (xj-fc, Xj+fc). This method is known as Savitzky-Golay- 
filtering and is widely used in data analysis. The domain has not to be sym- 
metric (as in generalized finite differencing), one defines a neighborhood by an 
interval (i — ri[,i + n r ) with ri[ + n r + 1 points, and rii not necessarily equal to 
n r . A linear regression is used to find the best polynomial fit of order m to the 
data, and the derivative is obtained from the coefficients of the polynomial. 
This procedure is repeated for every data point, like for a moving window. 
The key idea of Savitzky-Golay filtering is the conservation of higher statis- 
tical moments. A simple moving average always reduces the height of a local 
extremum. Due to the mentioned conservation property, the Savitzky-Golay 
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filter shows this reduction to a much less extent. Smoothness, however, is not 
guaranteed and the derivative can be discontinuous, which is not desirable for 
an estimate useful in physical problems where / is typically required to be 
smooth. A very nice and detailed discussion is found in the numerical recipes 
[14]. 



2.2 Global Methods 



Global methods yield an estimation f(x), defined on the whole interval a < 
x < b. Since the function is not known beforehand, it makes sense to use 
a representation by some basis functions, </>i(x), i G IN: f(x) = J2j a j(pj( x )- 
The estimation shall be best in the least squares sense, but as well smooth. 
Consequently a minimization problem with a side condition for smoothness is 
formulated. The coefficients aj are determined accordingly. The choice of the 
basis depends on the properties on /, either known by prior knowledge oder 
imposed a posteriori. For instance, it might be clear from the experiment that 
/ G C 2 is smooth (in that the second derivative exists), or continuous only, 
or periodic on the interval, or has other restrictions which are known to the 
modeler. 

To quantify the global smoothness of a function g{x) one uses its curvature 
[26] 

8 (g) = [ b \\g"(x)\\ 2 dx. (4) 

J a 

Maximizing the smoothness of an estimation amounts to minimizing its curva- 
ture. The function estimate f(x) is then determined by the usual least-squares 
minimization problem with the additional smoothness constraint. The amount 
of smoothing is controlled by the smoothing parameter A, which enters the 
minimization problem: 

X 2 = £{W - fa)} 2 + A f\f"{x)Y dx = min. (5) 

For simplicity, we have assumed equal measurement uncertainty for all data 
points <7j = const and neglected it in the formulas. The first term measures 
the least squares error of the fit, while the second term penalizes curvature in 
the function. Equation (5) is a a typical bias- variance problem [27], and the 
best choice of the smoothing parameter is nontrivial. Numerically, it can be 
determined using generalized cross-validation [28] . 

As mentioned above, / is usually represented by a superposition of some ba- 
sis functions with according coefficients, a,-. This is inserted into (5). The 
minimizing coefficients, a,j, are determined by a variational principle. As a 
consequence the conditions dx 2 /ddj = have to be fulfilled and the resulting 
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set of equations needs to be solved (see App. A). If we require / G C 2 to be 
twice differentiate, natural cubic splines are an obvious choice. For periodic 
functions, or / G C°°, a spectral representation suits well. In other situations, 
other basis systems might be favorable. In the following we consider smoothing 
splines and Fourier representation. 

We want to hint at this point that smoothing splines are different from the 
concept of interpolating splines. This is a very important point for the under- 
standing of the following. Interpolating splines fit a polynomial through every 
data point - in our case a horrible scenario, since we are faced with noise. 
Smoothing splines fit a low-order polynomial through bunches of data, such 
that the resulting approximation is best in a least-square sense, cf. Eqs. (4) and 
(5). Summarizing, we note that the smoothing and the interpolating splines 
have the representation as low-order polynomial of nth order (typically 3rd 
order) in common. Differences lie in the construction. Whereas interpolation 
puts a smooth line between points, smoothing splines use regression to put 
the best "line" (or a n-dimensional surface, in general) through many points. 
Both, interpolating or smoothing ones, are global, because there is no jump in 
the nth-order spline over the whole definition range - nor in the n-1 possible 
derivatives! Furthermore, the smoothing spline is constructed from a global 
measure (4). This measure couples the local spline coefficients. 

In [11], it has been shown rigorously that the minimizer of (5) is a natural cubic 
spline, provided the function / is square integrable. The spline representation 
used throughout this paper reads 

n+2 

/(,-) (6) 

3=0 

where 7^ are the coefficients of the cubic B-spline basis functions Bj (x) and n is 
the number of knots for constructing the smoothing spline. After the solution 
of the minimization problem one calculates the derivative analytically from the 
basis functions. The property, important from a fundamental point of view is 
the smoothness of the splines. As shown in [11], e ~ V6 for Ax — > 0, which 
is superior to the accuracy (3) for finite difference methods, especially in the 
case of very fine sampling. 

In the case of a spectral estimate, one writes 

n/2 

/» = E c k e^ L (7) 

k=-n/2 

to be inserted into (5). As a result, a system of equations is obtained for the 
coefficients G C. If n = N and A = 0, the data are exactly interpolated 
(Fourier transformed), for n < N and/or A 7^ 0, a spectral smoothing problem 
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results. This can be solved (see App. A), but we will not consider this case. 
Instead of solving the complete smoothing problem in spectral space we follow 
a slightly different strategy: many experiments suggest that the noise sits 
predominantly in the high frequencies. Then a low-pass filter can be applied. 
A well-behaved standard filter is the Butterworth filter [29], which reads for 
the m-th order: 

B(k,k ) = -L^, (8) 

where k is the frequency, ko is the cutoff-frequency and m defines the steep- 
ness of the filter. The two latter parameters must be related to the noise in 
the data. In the case of white noise, ko can be chosen just above the last 
k which occurs from the dynamics of the system, the steepness determines 
how "fast" the noise amplitude is damped away spectrally after the cutoff 
frequency, this can be varied in each case, typical values lie between 6 and 
10 [14]. So one simply performs a Fourier transformation, applies the spectral 
filter and transforms back. The derivative is then obtained in spectral space 
by multiplication with (i2nk)/L (remember that y'(x) = J2k^j^ c ke t2wkx ^ L ). 
The complete representation of / is 



k=N/2 k=N/2 

f(x)= £ c k B(k,k )e^ L = £ 6 k e a * k *' L , (9) 

k=-N/2 k=-N/2 



where c& are the coefficients obtained by Fourier transformation and 6\. = 
CkB(k, ko). It should be mentioned here that this representation also leads to 
a smoothing problem (5), cf. App. A. But instead of the computation of a 
complete set of coefficients dj one determines the cut-off frequency k optimal 
for a given A, since ko is the only parameter to be varied. This means that A 
and ko are equivalent and directly related as is shown in Appendix A. As an 
alternative to the Butterworth filter, one can use the Wiener filter, which is a 
good choice for linear processes. For scaling systems [30], wavelet transforms 
[31,32] are a preferred choice. This underlines one obvious feature: the more 
information about the underlying process is known, the more details one uses 
for constructing the best method to estimate derivatives. 

Both ways to solve Eq. (5), spline and spectral method, yield n equations for 
n unknowns (n being the number of knots for the smoothing splines or the 
number of basis function in the spectral method). The equations are overde- 
termined, since N > n data points are available. This is resolved by the sum 
in the minimization procedure which eventually yields an n x n matrix. The 
procedure is similar to the usual linear regression [14,26]. If a Butterworth 
filter is applied, the matrices are reduced to dimension 1. 
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3 Numerical Results 



We compare the above methods by three examples, two using numerical data, 
one using experimental data: 1) the sine function, 2) the Lorenz system [33] in 
a chaotic state and 3) data from an acoustic measurement [16]. We quantify 
our results with the mean square error of the estimate of the first derivative 
and its smoothness (or equivalently its curvature). The dependence on the 
parameter of the methods are discussed in detail. 

The variance 

A =< (f'( Xi ) - fix,)) 2 > (10) 
is given as an overall measure for the results. The derivative is known exactly 
for the numerical data. As a second measure we use the curvature s(f) from 
(4) or the difference 

S=( 8 {f)- 8 {f)Y=^\f"(x)\ 2 dx- j'lnx^dxj . (11) 

Other measures can be used, e.g., correlations or a norm different from L2. We 
would like to point out that the characterization of results by (10) is common 
to local and global methods, but the latter use as additional constraint the 
smoothness. One of the main concerns of this publication is hint to the im- 
portance, from physical arguments and mathematical considerations, of this 
property. 

For the finite differences we choose a second order method, where the pa- 
rameter to be varied is the width 8x. The approximation of the derivative 
reads 

/df\ _ 4 y i+l - yj_i 1 y i+2 i - yi-21 , . 

\dxJi 3 25x 3 A5x 1 ' 

where / is a positive integer determining the step width and 5x = lAx. The 
Savitzky-Golay filter was of fourth order; parameter dependence on the win- 
dow size rii = n r = n has been investigated. The spectral estimator has as 
parameters the cut-off k and the number of basis functions, which is set here 
to n = N. The splines have the number of knots, n, used as parameter. The 
knots are, for the ease of use, space equidistant. Additionally, for splines and 
spectral method, one has the smoothing parameter to be varied. In the spec- 
tral case the variation of A is equivalent to the variation of fco- According to 
the above, we approximate the function and then determine the derivative by 
one of the methods under consideration. 

A comparison of the used methods requires a scaling of the parameters. For the 
finite difference method we use the quantity wpp = 41 Ax = 4Sx the distance 
between the leftmost and the rightmost point in the considered domain. For 
Savitzky-Golay filter we choose the window size wsg — (2> n + l)Ax. wsg 
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Fig. 1. Difference A/'(xj) = /'(xj) — f'(xi) of the estimate and the exact deriva- 
tive f'(x) = cos(x). Measurement noise was simulated by a Gaussian white noise 
process with standard deviation a = 0.5. From bottom to top: 1. second order fi- 
nite difference method (5x = 1.55, wfd = 6.18) 2. Savitzky-Golay-filter (n = 143, 
wsg = 3.61), 3. spectral method (ko = 2.99, ws = 2.09) and 4. smoothing splines 
(n = 7, A = 0.13, wsm = 0.89). Parameters for the methods were chosen such 
that they were optimal in the sense of Eq. (10). The offsets 0, 2, 4, 6 are added 
respectively for better visibility. 

which corresponds directly to Wfd- For the spectral method, we define the 
window size by the wavelength of the cut-off, w$ = L/k (since n = N). 
For the smoothing splines, finally, the typical window can be described by 
Wsm — L/n, the distance between two knots. 

3.1 Sine Function 

As a first example we use the function f(x) = sin(x) defined on the interval 
< x < 2n with added Gaussian white noise rji with standard deviation o~ 
and zero- mean. The data set consists of 500 Points yi = f(xi) + rji, so that 
Ax = 27r/500. In Fig. 1 the result for the derivative estimate is displayed 
(a = 0.5). Finite differences (Sx = 1.55, w FD = 6.18) yield an unacceptable 
result with extreme fluctuations of the order of 0.1. Also, the result for the 
Savitzky-Golay filter (n = 143, wsg — 3.61) is not smooth and deviates heavily 
at the boundaries. Spectral (ko = 2.99, ws = 2.09) and spline (n = 7, A = 0.13, 
wsm = 0.89) methods, apparently work better. 

Up to now, we showed the result for a specific set of parameters for each 
method. For a complete analysis, the dependence of the different methods on 
their parameters needs to be investigated. We varied the parameters over a 
wide range and considered the mean square error (10), cf. Fig. 2. For every 
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(a) (b) 

Fig. 2. Parameter dependence of the mean square error of the derivative of sin(x) 
with additive with noise a = 0.5. (a) For the finite differences, Savitzky-Golay-filter 
and the spectral method. • - finite difference method, the according parameter is 
wfd, B ~~ Savitzky-Golay-filter (wsg), ° ~~ spectral method (u>s). (b) mean square 
error for the smoothing splines in dependency of wsm and A. 

method a minimum occurs at the point of optimal approximation in the least- 
squares sense. 

For the finite difference method we varied the width 5x. The optimal width 
is 5x = 1.55 (wfd — 6.18), nearly 1/3 of the domain. Smaller spacing results 
in larger fluctuations, whereas an increasing spacing will not approximate the 
desired derivative. It is clearly visible from Fig. 1 that finite differences show 
the strongest fluctuations among all considered techniques. In practice finite 
differences should not be the first choice. 

For the Savitzky-Golay-filter we varied the window size to find an optimum at 
n w 143 (w S g = 3.61). For a larger window the filter smoothes the function too 
much, and the result tends to a constant. For a smaller window the influence 
of noise becomes locally more important. 

For spectral differentiation, the filter cut-off k can be varied from 0-250 to 
determine the optimal cut-off. We find /c ~ 3 (ws = 2.09). The original func- 
tion, sin(x) implies that only k — 1 is active in spectral space; for a top-hat 
filter with sharp edges this would result in ko — 1. Because B(k, ko) is smooth, 
a slightly larger cut-off is found. For higher cut-off values the approximation 
increasingly oscillates around the optimal solution. For spectral differentia- 
tion, in general, problems near boundaries occur, if the data are not perfectly 
periodic. In this case some data points close to the boundaries should be dis- 
carded after the determination of the derivative, but it might be better to 
switch to splines or other basis functions. 

For smoothing splines, the mean square error depends on the number of used 
knots and the smoothing parameter. The optimal values are A ~ 0.13 and 
k = 7 (wsm — 0.89). When the smoothing parameter is increased the estimate 
consistently tends to a constant, when it is decreased the estimate represents 
a bigger part of the noisy fluctuations. The meaning of the number of used 
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(a) 



CO 




(b) 



Fig. 3. Characterization of the derivative estimate of the sine data for different noise 
levels, a) least squares error, b) curvature S. On the x-axis the noise level of the 
white Gaussian noise added to the signal is plotted. • - finite difference method, ■ 
- Savitzky-Golay-filter, ▲ - smoothing splines, o - spectral method. 



knots for the smoothing splines can be understood as the degree of freedom of 
the smoother; if there is, e.g., one oscillation in the data, one needs minimal 3 
knots for approximation. More oscillations require more knots, corresponding 
to a higher resolution, e.g., 10 oscillations can not be resolved with a smoothing 
spline of 10 knots; for details see [12]. In this sense the degree of freedom of 
the smoother can be understood as ability to fit a given number of oscillations 
in the data. If the number of knots exceeds the number of oscillations, one is 
faced with the bias-variance problem that all methods have in common. 

For a direct comparison of the investigated methods for numerical differenti- 
ation we studied the dependence of the mean square error (10) on the noise 
level a for each method. Results are shown in Fig. 3a. Finite differences have 
an error about one order of magnitude larger than the other methods. The 
spectral method works very well for a small noise level, with an error of one or- 
der smaller than the Savitzky-Golay-filter and smoothing splines. For a higher 
noise contamination, however, smoothing splines are the best choice, whereas 
spectral methods and the Savitzky-Golay-filter are comparable by means of 
(10). Nevertheless, from a mathematical point of view, the Savitzky-Golay 
method is a local approximation, with no smooth relation between y'{xj) and 
y'(x i+ i). This is also visible in Fig. 1 where at several points jumps can be 
observed. 



To quantify the intuition on smoothness from the graphs (Fig. 1), we studied 
the dependence of the curvature S on the standard deviation of the noise, 
cf. Eq. (11). The estimate was interpolated by a spline and then S(f) has 
been computed from (4) by averaging \f"(x)\ 2 over each interpolated data 
point. The interpolation is necessary to determine the second derivative /". 
One obtains s(f) analytically, so that S is easily determined. The results are 
shown in Fig. 3b. The curvature of the Savitzky-Golay-filter is some orders 
of magnitude worse than the one for the spectral method and the smoothing 
splines. Finite difference techniques are worse than Savitzky-Golay-filter, so 
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Fig. 4. Time dependency of the x-component of the Lorenz system, with additive 
with noise a = 2.0. Also shown are the results of the regression: Savitzky-Golay filter 
with wsg = 0.47 (bottom), spectral method ws = 0.25 (middle) and smoothing 
splines wsm = 0.13 (top). An offset is added for better visibility. 

that we did not consider it here. The main difference between the methods is 
the different behavior of the smoothness, or the curvature, respectively. While 
for the Savitzky-Golay-filter one can clearly see a linear relation between noise 
and smoothness, for the spectral method and the splines the relation is roughly 
a constant, up to a certain, parameter-dependent noise level. One recognizes 
large fluctuations of S, resulting from over-smoothing the fit. 

3.2 Lorenz System 

As a second example we analyzed the x-component of the Lorenz system 



We integrate the system numerically by a Runge-Kutta algorithm of fourth 
order with time-step 0.01 and parameters a = 10, R = 28, b = 8/3. The 
integration was performed over 500 steps only. We also added Gaussian white 
noise to the data. The x component of the trajectory of (13) is shown in Fig. 4. 
Results for the dependence of the mean square error on the level of noise are 
similar to the previous case, f(x) = sin(x) (see Fig. 6). For small noise the 



(Fig. 4), 



x = a(y — x) 
y = Rx — y — xz 
z = —bz + xy. 
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Fig. 5. Parameter dependence of the least squares error for the x-component of 
the Lorenz system with additive white noise a = 2.0 (a) For the finite differences, 
Savitzky-Golay-filter and the spectral method. • - finite difference method, the 
according parameter is wfd, B - Savitzky-Golay-filter (wsg), _ spectral method 
(ws)- (b) least squares error for the smoothing splines in dependence on the number 
of knots k and the smoothing parameter A. 

spectral method provides the smallest approximation error. Finite differences 
are here of the same order of magnitude. 

Because in a chaotic time series, many scales are present with a broad spec- 
trum, the Lorenz system is a good candidate for a study of the parameter 
dependence of the estimates under the aspect of a scaling system. We show 
the comparison of the methods in Fig. 5. The optimal parameters found here 
are different from the values for the sine. This is explained by the different 
number of oscillations. The data set for sin(x) contains exactly one oscillation, 
whereas the data for the Lorenz system contains 6 oscillations. As mentioned 
above the parameters must be chosen such that the bias-variance trade-off 
encountered. 

For the Savitzky-Golay-filter, wsg = 3.61 for sin(x), whereas wsg = 0.47 for 
the Lorenz system. This difference can be explained by the fact that large 
window sizes will use many points for fitting a polynomial of 4th order to 
each data point. For data sets with many oscillations this will result in bad 
approximation of the derivative or the function. For the smoothing splines 
wsm = 0.89 for sin(x), whereas for the Lorenz system wsm = 0.13. To resolve 
many oscillations, obviously more knots are needed. In the spectral case the 
dependency of the cut-off is ws = 2.09 for sin(x) and ws = 0.26 for the 
Lorenz system. Summarizing this subsection, the results for the much more 
complicated chaotic time signal match the ones for the quite simple sine signal. 
This can be interpreted as a sign for the generality of the results. 

For a comparison of numerical differentiation methods with different data se- 
ries one usually rescales the x axis (in case of the Lorenz system the time t) 
by x = xN pea k/L, where iV peo fc is the number of peaks. Then, the regression 
parameters wfd, wsg, wsm and ws are of the same order. But furthermore it 
should be noted, that every data series possesses characteristic edges, which 
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Fig. 6. Dependence of the least squares error and the smoothness on the additive 
noise standard deviation for the x-component of the Lorenz system, Eq. (13). a) 
least squares error, b) curvature S. On the x-axis the noise level of the white Gaus- 
sian noise added to the signal is plotted. ■ - Savitzky-Golay-filter, A - smoothing 
splines, o - spectral method. The parameters for the fit are chosen to be optimal, 
cf. Fig. 5. 

will naturally result in different parameters. If the time series also consists of 
intermittent domains, the situation is even worse. Then N pea k/L is not a mea- 
sure for the mean 'wavelength' of the data, and the domains, containing the 
oscillations would be over-smoothed. A detailed discussion of such phenomena 
goes far beyond the scope of this article. 



3.3 Experimental Data 

As the last example we analyzed experimental data from the acoustical signal 
emitted from the mouth of an organ pipe. This measurement is needed, if the 
complex acoustical system of an organ pipe shall be modeled as a nonlinear 
oscillator [34] to be found numerically by nonparametric data analysis [35,36]. 
Basically, the physics of the measured data it is not very important for our 
purposes and we will not comment further on the origin of our data, details 
are found elsewhere [16]. The time series y(t) consists again of 500 points, the 
sampling interval is At = 1/44100 s. In contrast to the previous examples, 
we do not have separate access to the derivatives and the measures (10) and 
(11) can not be applied to determined the regression parameters. So, one 
needs to estimate the optimal parameters. In principle, there are methods like 
generalized cross-validation dealing with this problem [26,28,37,38]. We will 
now explain briefly the functioning of generalized cross-validation (GCV) to 
have a complete presentation, an explicit application to our data lies beyond 
the scope of this article and is subject to future work. 

GCV works by dropping one point (j/j, Xj) from the data (in general M points), 
the estimate is then based on the remaining N — 1 points. The GCV is con- 
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Fig. 7. Upper plot: data points of the organ pipe measurements and fitted function 
using Savitzky-Golay filter. The other methods are indistinguishable by eye. Lower 
plot: fitted derivatives. Solid line: Savitzky-Golay filter, dashed line: spectral filter- 
ing, dotted line: smoothing splines. The inset shows a part of the fitted derivative in 
the interval 0.04s < t < 0.05s Without quantitative analysis it is hard to say which 
curve is best, differences are however recognizable. The parameter used for fitting 
are wsg = 1-6 ■ 10~ 3 s for Savitzky-Golay filter, ws = 8.7 • 10~ 4 s for the spectral 
method and wsm = 3.2 • 10~ 4 and A = 2.6 • 10 -11 for the smoothing splines. 



structed by sum of squares 



1 N 

GCV(\) = -Y,{yi-fr(xi)} 2 - 

i=l 



(14) 



is the estimate for f(x) if the point (yi,Xi) is omitted. The optimal pa- 
rameter is calculated from GCV for a number of values of A over a suitable 
range, then the minimizing A is selected. In other words, Eq. (14) gives an em- 
pirical estimate of the combined systematic errors, induced by the choice of 
the parameters, and statistical errors due to noise in the data. The parameter 
is chosen to minimize that estimate. Note that this is done for the estimated 
function, not the derivative. 



Results for the estimates of function and derivative of the experimental data 
are shown in Fig. 7. For the experimental data we are faced with the problem 
that we do not know the true derivative. So, we have to compare the esti- 
mation, or construction of the function itself in terms of least-squares error 
e =< (f(x) — f(x)) 2 > and the corresponding curvature s. All methods ap- 
proximate the function quite well in terms of the least-squares criterion. As 
well the derivative seems well estimated, but slight differences are recognized. 
First, one notices the weakness of the spectral estimator at the boundaries; 
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as well splines and Savitzky-Golay filter should be considered with great care 
in this region, as we already saw in the previous sections. By eye, all three 
estimates appear indistinguishable. However, if we estimate the curvature we 
find great differences. 

We have two competing quantities in our considerations, the least-squares er- 
ror, and the curvature. To compare either of them, one should fix the other 
one, correspondingly we need to fix the least-squares error to obtain quanti- 
tative statements about s. cf. Eq. (4). We -quite arbitrarily- chose the value 
e = 2. Then, we searched in the whole parameter space of all the methods 
for fits with the least-squares error close to 2 and compared the according 
curvature. The Savitzky-Golay technique yields e = 2.04 and the correspond- 
ing curvature s = 2.05 • 10 15 , the window size wsg = 1-61 • 10 -3 . For the 
spectral estimator, we obtained e = 2.03 with a curvature of s = 6.16 ■ 10 13 at 
Ws = 8, 76-10 -4 . The spline estimator yeld an error e = 2.05 with s = 5.52-10 13 
and wsm — 3.23 • 10 -4 . As a result we find again that the global methods are 
smoother by two orders of magnitude in comparison with the Savitzky-Golay 
method. Obviously, as far as a smooth curve is concerned the global smoothers 
are superior to the local methods under consideration. If smoothness is not 
relevant, all three methods are equivalent if the boundaries are neglected. 



4 Conclusion 

We presented a qualitative and quantitative comparison of local and global 
methods for the numerical estimation of derivatives. Whereas local methods 
are appealing due to their simplicity and easy implementation, we advocate 
for global methods, because the properties of the functions can be defined in a 
proper way. We focused on the important constraint of smoothness (expressed 
by the curvature) and showed how a corresponding minimization problem 
is solved. Furthermore, we demonstrated that global methods are superior 
to local ones if high-precision estimates are needed or measurement noise is 
large. We did not want to consider in detail the computational cost. But it 
shall be mentioned that a global estimate can be expensive if the number of 
points exceeds 10 5 . Then, programming skill (or large memory) is required to 
encounter the problem of large matrices to be multiplied. 

We compared in this article finite differences, Savitzky-Golay filtering, smooth- 
ing splines and smoothing spectral estimators. To compare the methods, the 
dependence on parameters has been investigated, optimal parameters could 
be determined. The dependence on additive noise has been studied in detail 
and we found enormous differences in the methods. One result is that finite 
differences are orders of magnitudes off in comparison to the other methods. 
It is used as a worst case demonstration in this article. In terms of the least- 
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squares error the remaining three methods are comparable in the sense that 
they are of the same order of magnitude. However in terms of smoothness, the 
Savitzky-Golay filter fails by some orders of magnitude, and only the global 
methods work well. It is remarkable that we can compare the methods on a 
logarithmic scale, i.e., techniques differ really to a huge extent. A more detailed 
look shows that spectral estimators work very well for small noise levels. For 
high noise levels, smoothing splines yield better results for the investigated 
systems. 

Under a more general perspective, we showed how one can choose among 
some set of basis functions for the representation of the estimate. Obviously, 
for periodic functions a spectral representation is natural, similarly if one is 
interested in higher derivatives. If other boundary conditions are required, 
other basis functions might be favorable. Smoothing splines are an optimal 
choice if twice differentiability (or smoothness) is required, and no further 
information is available. For all basis systems the general procedure described 
above applies, formulated as a minimization problem. In principle one can 
imagine further constraints like minimal variance of the first derivative or other 
criteria. Those conditions can be easily built into the method as additional 
Lagrange multiplier. 

From a practical point of view one has to decide how important it is to obtain 
smooth functions to a reasonable accuracy. For a rough guess, a local filter 
might do, for any high-precision analysis the implementation of the minimiza- 
tion, or smoothing problem does pay off. E.g., if one wants to process further 
the obtained derivatives, small differences can yield enormous changes in the 
final results. Our interest started with an application in some reconstruction 
techniques [39], where local methods are by far too inexact. Similar holds for 
prediction problems where the integration of functions based on the estimate 
of the derivatives is important [40]. 
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A Appendix: Spectral smoother 

In the following, we assume that x e [0, 1] for the measured data (x n , y n ); n G 
1,...,N 

The spectral representation of a function reads 

JV-l 

nx) = y: c k ^ kx . (a.i) 

k=0 

and the smoothing term in Eq. (5) reads 

A f 1 \f"(x)\ 2 dx = \ f 1 f"(x)f"(xydx (A.2) 
Jo Jo 

JV-l ! 

= 16Att 4 V c k c\k 2 l 2 \ e i2 ^ x dx 

k,l=0 Jo 
JV-l 

= 16Att 4 ^ c k c* k k 4 , 



k=0 



where Jq 1 e t2w ^ k l ^ x dx = 8 k i is used. To solve the minimizing problem Eq. (5), 
we insert the above into 



X 2 = T,\yn-fM\ 2 + X \f"(x)\ 2 dx = mm., (A.3) 

n=0 J ° 
N , JV-l s , N-l v N-l 

= E - E c k e^ k *A (y n - £ 4e-^" + 16Att 4 £ c^A; 4 . 

n=0 ^ k =0 ' ^ k =0 ' k =0 

By variation of the coefficients we obtain the conditions 

§^ = 0,^ = 0. (A.4) 

dc k dc* k 

This yields the equations 

N / JV-l \ 

= - ei2nkXn [Vn ~ E c\e- i2 ^ lXn + 16tt 4 Ac* k 4 (A.5) 

n=l V ;=o / 

JV / JV-l \ 

= - E e " 2 ^ 2/n " E c^*" + 167r 4 Ac fe A; 4 (A.6) 

n=l V 1=0 J 

These are 2N linear equations for the 2N unknowns {c k ,cl} which can be 
solved by usual algebraic manipulation. 
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In the case of using the Butterworth filter (8), one determines the Fourier 
coefficients c k in the conventional way [14]. Then the filter is applied. The 
only possible variation is in fc and a single equation results. For simplification 
we write B(k, k ) = B k , dB(k, k )/dk Q = B' k and c k e t2nkXn = C kn and obtain: 



dv 2 

ik=° < A7 > 

N ,N-1 N-l N-l N-l s 

= - E E B' k C kn (y n - £ B t Cl) + £ B' k C* kn (y n - ]T B t C ln ) + 

n=l V k=0 1=0 k=0 1=0 ' 

N-l 

+32tt 4 A B k B' k k A c k c* k (A.8) 

k=0 

N , N-l N-l s 

= - E E B' k y n (C kn + C* kn ) - £ ^A(C fen Q; + C^C7 lB ) (A.9) 

n=l V fe=0 fc,/=0 7 

N-l 

+32vr 4 A B k B' k k 4 c k c* k (A. 10) 



k=0 



Using the definition of the Fourier components c k = J2n=i Vn e t2whXn this for- 
mula can be written as 



/ 2 A' l JV-1 

•9/co 



d\ 2 

! -27V ^ (^CfccJ + B' k B k c k cl) + 32vr 4 A ]T B k B' k k 4 c k c* k , 



k=0 k=0 

where we use l/iV£^ =1 e i2 < k ' l >" = 5 kh 1/NY££ e ,2lfc(l "" Im) = 5 nm and 5 nm 
the Kronecker delta. The factor iV can be avoided, if one scales k h- > 27i/Nk . 

We write the minimum condition as = — F + AC Then a simple relation 
A = F/G results, relating lambda to k . The inversion of this formula yields 
fco(A). 
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